The identification of the Rosa S-locus provides new insights into the breeding and wild origins of continuous-flowering roses

Abstract This study aims to: (i) identify the Rosa S-locus controlling self-incompatibility (SI); (ii) test the genetic linkage of the S-locus with other loci controlling important ornamental traits, such as the continuous-flowering (CF) characteristic; (iii) identify the S-alleles (SC) of old Chinese CF cultivars (e.g, Old Blush, Slater’s Crimson China) and examine the changes in the frequency of cultivars with Sc through the history of breeding; (iv) identify wild species carrying the Sc-alleles to infer wild origins of CF cultivars. We identified a new S-RNase (SC2) of Rosa chinensis in a contig from a genome database that has not been integrated into one of the seven chromosomes yet. Genetic mapping indicated that SC2 is allelic to the previously-identified S-RNase (SC1) in chromosome 3. Pollination experiments with half-compatible pairs of roses confirmed that they are the pistil-determinant of SI. The segregation analysis of an F1-population indicated genetic linkage between the S-locus and the floral repressor gene KSN. The non-functional allele ksn is responsible for the CF characteristic. A total of five S-alleles (SC1–5) were identified from old CF cultivars. The frequency of cultivars with SC dramatically increased after the introgression of ksn from Chinese to European cultivars and remains high (80%) in modern cultivars, suggesting that S-genotyping is helpful for effective breeding. Wild individuals carrying SC were found in Rosa multiflora (SC1), Rosa chinensis var. spontanea (SC3), and Rosa gigantea (SC2, SC4), supporting the hypothesis of hybrid origins of CF cultivars and providing a new evidence for the involvement of Rosa multiflora.


Introduction
The rose is one of the globally most popular ornamental plants, with a long history of breeding and cultivation. It is important not only economically, but also culturally. More than 30 000 rose cultivars have been developed [1] mainly by cross breeding. Patterns of inheritance are quite difficult to predict for most traits, as roses are predominantly outcrossing and highly heterozygous plants [2]. As a consequence, the success of cross breeding in the rose has largely depended on chance and the experience of breeders, requiring enormous efforts to make new cultivars with desirable traits [3]. Advances in scientific knowledge on rose genetics have been much awaited. Researchers have developed molecular markers to construct genetic linkage maps of the rose, clarifying the underlying genetic mechanisms controlling ornamental traits [4] and, genome sequencing finished in 2018 [5][6][7]. Key genetic factors controlling important ornamental traits, such as scent production [8,9], continuousflowering (CF) [10], and double-flower (DF) [5,11] have been identified.
Although identification of the S-locus controlling selfincompatibility (SI) in roses is essential for improving breeding at the diploid level, it has not been fully elucidated. The gametophytic SI is controlled by a single S-locus with multiple alleles, and when one of the two S-alleles of the pistil matches those of pollen, the pollen is recognized as self and is rejected [12]. The pistil S gene of Rosaceae encodes an extracellular ribonuclease called S-RNase, which acts as a cytotoxin in self-pollen tubes. The pollen S gene is involved in detoxifying nonself-S-RNases and encodes an F-box gene [12]. The "collaborative non-self-recognition" model was proposed for SI in the Solanaceous plant, Petunia [13], where multiple F-box genes are involved in pollen specificity, and each targets a subset of non-self-S-RNases for detoxification. The SI of Rosaceae may adopt this "non-self-recognition by a multiple factors" system [12], except for the selfrecognition by a single F-box gene system in Prunus [14]. In the genome databases of R. chinensis "Old Blush", two different regions have been proposed as the S-locus [5,15], both of which are located on the same chromosome 3 within a few Mbp of each other. Vieira et al. [15] concluded that their candidate region was the true Slocus because its S-RNase (hereafter called as S C1 S-RNase) has a stronger similarity to the Prunus S-RNase than the S-RNase36 identified by Hibrand-Saint Oyant et al. [5]. Chen et al. [16] identified an S-locus-like region in the genome of Rosa rugosa, whose the S-RNase is orthologous to the S C1 S-RNase of Old Blush. Du et al. [17] identified the S-RNase controlling SI in Fragaria, which is also more similar to the S C1 S-RNase rather than S-RNase36. These results indicate that the S C1 S-RNase is the true S gene controlling SI in Old Blush. However, validation of the Rosa S-locus by pollination experiments has been conducted with only a few individuals [15]. Pollination tests with many individuals are required to confirm the result. Furthermore, Vieira et al. [15] identified the S C1 S-RNase of Old Blush from the genome database of Raymond et al. [6] but failed to identify its allelic gene in the other genome database of Hibrand-Saint Oyant et al. [5]. The S-RNase36 identified by Hibrand-Saint Oyant et al. [5] has a single intron and weak homology to the S C1 S-RNase, which has two introns. As the two genome databases of Old Blush represent different haplotypes of chromosome 3 [18], there should be unidentified S-RNase in the genome database of Hibrand-Saint Oyant et al. [5]. Thus, the first objective of this study is to (i) identify the other allele of S-RNases in Old Blush and to confirm the Rosa Slocus with strong evidence from a number of pollination experiments.
The second objective of this study is to (ii) test the genetic linkages of the S-locus with other loci controlling important ornamental traits. In the chromosome 3 where the Rosa S-locus is assumed to be located, there are other important loci controlling valuable traits for ornamental plants, such as CF [19], double flower [20], thornlessness [21], and resistance against black spot disease [22]. These previous studies found strong skewness in the segregation of these traits and hypothesized that their underlying genes are genetically linked with the S-locus. The genetic linkage of the S-locus and other loci may have important consequences for rose breeding. For example, the CF characteristic is controlled by a single recessive locus on the chromosome 3, and the underlying gene is the non-functional allele (ksn) of RoKSN [5,10], a floral repressor gene in roses [23]. Roses homozygous at the non-functional allele ksn will be targets in the cross breeding of CF roses. But, if the ksn alleles of parental roses are strongly linked with the same S-allele, breeding of CF roses with the ksn-homozygote will be hampered by SI, where the pistil rejects the pollen with the same S-allele linked to ksn.
The third objective of this study is to (iii) determine the S-allele (S C ) of old Chinese CF cultivars and to clarify the changes in the frequency of rose cultivars with S C during the history of rose breeding. The ksn allele conferring the CF characteristic in modern rose cultivars originated from old Chinese cultivars around 200 years ago [10,24]. The original Chinese cultivars introducing the CF characteristic into modern roses are thought to be the four China roses, i.e. Slater's Crimson China, Parsons' Pink China (Old Blush), Hume's Blush Tea-scented China, and Parks' Yellow Tea-scented China [25]. After the introgression of ksn from China to Europe, the frequency of rose cultivars carrying the Sc should increase but may progressively decrease during the 200 years' history of rose breeding due to the hybridizations with European cultivars. However, if Sc is genetically linked with ksn, the strong artificial selection on ksn [24] might result in a ksn-associated increase in the frequency of Sc during the history of rose breeding.
The fourth objective of this study is to (iv) identify candidate genotypes for wild ancestors of old Chinese CF cultivars by using S C as markers. The introduction of Chinese CF cultivars is one of the most revolutionary events throughout the history of rose breeding [24,25], and the Chinese cultivars have profound effects on the genetic bases of modern rose cultivars [26]. However, the wild ancestral origin of the old Chinese CF cultivars have not been completely elucidated. Molecular phylogenetic studies have indicated their complex hybrid origins [27][28][29][30][31][32]. The whole genome sequencing will play a critical role in the elucidation of the hybrid origin of old Chinese cultivars but is too complicated to analyze the massive data for many candidate species. There is also a large intraspecific genetic variation within the species [33,34], and there might be also natural hybrids between the species. Due to the high genetic divergences of the plant S-locus [35], we expect that the S-allele specific marker is useful to pinpoint the candidate wild ancestors of old cultivars.

Genome-wide identification of candidate S-RNase
We performed a genome-wide search for candidate S-RNase genes in the genome databases of Old Blush [5,6], R. multiflora [7], and R. rugosa [16,39] (Supplementary data  Table D2) and identified 16 genes located on four chromosomes (Fig. 1a). Of these, seven genes were expressed in the pistil according to the pistil transcriptomes of Old Blush, R. multiflora, and R. rugosa (Fig. 1b). These candidate genes were named by chromosome number and letter, e.g. 3D and 6A (Fig. 1a). The specific genes of Old Blush, R. multiflora, and R. rugosa were named with Rc, Rm,  and Rg, respectively (Fig. 1c). The putative amino acid sequences were obtained from the cDNA sequences, and a molecular phylogenetic tree of the pistil-expressed, S-RNase-like genes was constructed, including the Prunus S-RNase (Fig. 1c). The results show that the 3D and 0A genes are the primary candidates for S-RNase controlling SI in the rose due to their (i) high expression levels in the pistil (Fig. 1b), (ii) similarity to Prunus S-RNase (Fig. 1c), and (iii) high genetic divergences (Fig. 1c). The 3D S-RNase gene in Old Blush named as "Rc3D_S C1 " (Fig. 1c) is the same gene previously identified by Vieira et al. [15] as the candidate S-RNase in roses. We identified nine additional 3D S-RNase-like genes from the genome databases and pistil transcriptomes of R. multiflora and R. rugosa (Fig. 1c). One of them, named as "Rg3D_S 18 " , is the same gene previously reported as S-RNase (Chr4.718) in the S-locus of R. rugosa [16]. The high genetic divergence (64.8%) in the 3D S-RNase gene agrees well with the assumption that this gene encodes the true S-RNase controlling SI in roses ( Table 1). One of the 3D genes of R. multiflora, named as "Rm3D_S C1 ", is identical to the Rc3D_S C1 of Old Blush, indicating that the R. multiflora and Old Blush share the same S-alleles (S C1 ).  (Fig. 1b): * * * , >100; * * , 50-100; * 10-50.
In the same clade of the Prunus S-RNase and the Rosa 3D S-RNase, there were three other genes (6C, 3A, 0A; Fig. 1c). The 6C and 3A genes are located in different genomic positions from the 3D S-RNase (Fig. 1a). Therefore, they are not allelic genes to 3D genes. The 6C and 3A genes have low or undetectable expression levels in pistils ( Fig. 1b) and low genetic divergences between alleles (>94% identity; Table 1). One allele of the 6C gene of Old Blush [5] and that of R. rugosa [39] are pseudogenes due to frameshift mutations (Supplementary Data Table D2). The 3A gene is homozygous in Old Blush and cannot be identified in two R. rugosa genomes. These results indicate that the 6C and 3A genes are not functional S-RNase genes.
The 0A gene is supposed to be the allelic gene to 3D S-RNase in roses. The 0A gene of Old Blush, named as "Rc0A_S C2 ", was identified in the chromosome 0 (i.e. a contig not assigned to one of the seven chromosomes) of the genome database of Hibrand-Saint Oyant et al. [5]. We identified five 0A S-RNase-like genes from the genome databases and pistil transcriptomes of R. multiflora and R. rugosa (Fig. 1c). These 0A S-RNase-like genes are strongly expressed in pistils (Fig. 1b) and exhibit high genetic divergence (62.9%; Table 1), like the 3D S-RNase gene. According to the qPCR analyses of different tissues and developmental stages (Supplementary information 2), the 0A and 3D S-RNase genes show pistil-specific expressions. The analyses of pistil transcriptomes of seven diploid R. multiflora plants showed that four plants (Rm08, Rm09, Rm28, Rm33) expressed one 3D gene and one 0A gene, while the other three plants (Rm27, Rm2, Rm1) expressed no 0A genes but two 3D genes (Table 1). This also suggests that the 0A and 3D genes are allelic.
To test the hypothesis that the 0A S-RNase is allelic to the 3D S-RNase, we genetically mapped these genes using a diploid mapping population. The female parent of the diploid mapping population (FW), "The Fairy" (TF), has two 0A genes, one of which is identical to the Rc0A_S C2 (hereafter called as S C2 for simplicity) of Old Blush and the other called S 21 (Supplementary information 4). The TF has an S C2 / S 21 genotype of the 0A S-RNase. In the male parent, RW, we identified one 3D S-RNase-like gene and named it S 1w (Supplementary information 4); it has a sequence similar to the Rc3D_S C1 of Old Blush (hereafter called as S C1 for simplicity). The RW has an S 1w / S x genotype of the 3D S-RNase (S x is an anonymous allele). The 0A and 3D S-RNase genes are mapped to homologous positions of chromosome 3 of the female (TF3) and male (RW3) genetic linkage maps, respectively (Fig. 2a). Estimation of genomic position of the S gene from map cM of the integrated map (IT3) indicates that the S gene is located on 41.5 Mbp of the chromosome 3 (RC3) in the genome database of Hibrand-Saint Oyant et al. [5] (Fig. 2b) and is located on 5.6 Mbp of the chromosome 3 (Chr3) of the other genome database of Raymond et al. [6] (Fig. 2c). In the latter genome database, the S C1 S-RNase is located at 5.5 Mbp of the Chr3 close to the estimated position (Fig. 2c).
The two haploid genome databases of R. chinensis "Old Blush" are inversely oriented (Fig. 3a). The estimated

The identification of the S-locus F-box genes
There should be pollen-expressed F-box genes flanking the S-RNase. This prediction was confirmed by the genome database search and transcriptomes of the stamens of Old Blush (Table 2). From 5.3 to 5.8 Mbp region in Chr3, we identified 12 F-box genes and one S-RNase (S C1 ). Homologous F-box genes were identified in 41.3-41.9 Mbp region in RC3, and three other F-box genes were also identified in the contig RC0 containing the S C2 S-RNase (Table 2). All F-box genes were expressed in the stamen but not in the pistil. According to the nomenclature of Kubo et al. [13], these F-box genes were termed as SLF or FBX (Supplementary Table S7-2). We also confirmed that the similar number of SLFs were identified in 500-600 kbp regions flanking to the 3D S-RNase-like genes of R. multiflora (S 15 S-RNase, S 16 S-RNase) and R. rugosa (S 18 S-RNase, S 20 S-RNase) genome databases, and they were expressed in the stamens but not in the pistils (Table S7-1a,b). Figure 4 shows the structures of the putative S-locus regions identified in the Rosa genome databases. In the S-locus of Old Blush chromosome RC3, there is no S-RNase but there is a poly-N region, and its adjacent 10kbp regions (Block-a, Block-b) show close sequence identities with the end regions of the contig RC0. Block-a is 11 220 bp with a 98.8% identity between the RC3 and RC0. Block-b is 11 382 bp with a 97.5% identity. The S C2 S-RNase in the contig RC0 appears to be integrated into the poly-N region of RC3. Furthermore, SLF7 is present in the contig RC0, and its orthologous genes were identified in the S-locus regions of R. rugosa and R. multiflora (See also Fig.S7-2 for the alignment). Further, the co-segregations of SLF5 in RC3 and S C2 S-RNase in RC0 were tested using the FW mapping population (Supplementary information 8) and found to be perfectly co-segregated (n = 97).

Validation of the S-RNase based S-genotyping by pollination experiments
In order to validate these S-RNase genes as the pistildeterminant of SI, pollination experiments on the pairs of diploid roses that share the same S-alleles were performed (Table 3). For example, plants with the S-genotype S C1 / S x (where S x is an anonymous genotype that includes several different S-RNase genotypes) were selected and pollinated by pollen collected from the plants with S C1 / S y (where S y is another anonymous genotype). After seed maturation, the S-genotypes of the seeds were determined, and the genotypic frequency of pollen fertilizing the ovule were calculated. A total 111 seeds were analyzed, and results show that no S C1 pollen fertilized the ovules (i.e. all seeds are derived from pollen with S y genotypes). These pollination tests with half-compatible pairs were performed for R. multiflora plants sharing S C1 S-RNase-like genes, i.e. Rm3D genes, S 7 , S 9 , S 11 , or S 13 S-RNase (Fig. 1c), providing the same results with S C1 by analyzing total 174 seeds (Table 3). Furthermore, the pollination experiment of TF (S C2 / S 21 ) with the pollen of Old Blush (S C1 / S C2 ) supported the prediction that the S C2 pollen of Old Blush was rejected by the pistil of TF by analyzing 60 seeds ( Table 3). The cross pollinations with R. multiflora plants sharing S C2 S-RNase-like genes, i.e. Rm0A genes, S 6 , S 10 , S 12 S-RNase (Fig. 1c), providing the consistently same results for total 124 seeds (Table 3). These data strongly support the hypothesis that the 3D and 0A S-RNase genes are the pistil-determinant of SI in roses.   Table 1 for R. chinensis "Old Blush" and Supplementary Table S7-1a,b for R. rugosa and R. multiflora.

Linkage between the S-locus and important ornamental traits
To consider the effect of SI on the breeding of roses, the degree of genetic linkage between the S-locus and the genes underlying important ornamental characteristics (CF and DF) was tested. The KSN gene (controlling CF) has a 13.5Mbp distance from the S-locus, and the AP2-like gene (controlling DF) has a 9.0 Mbp distance from the Slocus (Table 4). By using two diploid F 1 hybrid populations [19,41], the recombinant frequencies between the S-locus and these genes were estimated. The recombinant frequency between S and KSN is 20% in RW, and those between S and AP2-like is 13% in RW and 40% in TF in the FW population. The recombinant frequency between S and AP2-like is 22% in 93 / 1-119 in the 94 / 1 population. Except for the high recombination between the S and AP2-like loci in TF, there are significant genetic linkages between the S and the KSN loci and the S and the AP2-like loci. This may act as an internal constraint on rose breeding, which is discussed later.

Sc-alleles of old Chinese CF cultivars
To identify the Chinese S-alleles (S C ) introduced into Europe in the 18 th century and examine their fate during the past 200 years' history of rose breeding, we first analysed the S-genotypes of old Chinese CF cultivars ( Table 5). The S-genotyping showed that the old Chinese cultivars frequently shared S C1 and S C2 alleles with Old Blush (Table 5). Hume's Blush Tea-scented China has the same S-genotype (S C1 / S C2 ) as Old Blush. The other CF cultivars, such as Slater's Crimson China, R. chinensis "Mutabilis", and R. chinensis, have either S C1 or S C2 , indicating that they have unidentified S-alleles. These unidentified S C -alleles were amplified with RT-PCR using mRNA prepared from their pistils with degenerate primers for S-RNase in roses and sequenced (Fig. S10-1). Partial (S C3 , S C4 ) and full (S C5 ) S-RNase cDNA sequences were obtained and analysed via BLASTX searches against a local protein database of the Rosa S-RNase and S-RNase-like proteins, shown in Figure 1c. The BLAST search confirmed that the S C3-5 sequences are closest to the 0A S-RNase proteins (See Fig. S10-2 for the alignment). As a consequence, S-genotypes were estimated as follows: Mutabilis = S C1 / S C5 , R. chinensis = S C1 / S C4 , Slater's Crimson China = S C1 / S C3 or S C1 / S C2 / S C3 (triploid type), and Sanguinea = S C2 / S C3 ( Table 5).
The genotyping of the KSN gene responsible for the CF characteristic identified five KSN genotypes (Table 5). Three KSN alleles, i.e. KSN W (wild allele), ksn copia (the Table 3. Validation of the S-RNase-based genotyping by pollination experiments with half-compatible pairs. See The common S-alleles shared by parents are underlined. S C1 is S-RNase in Chr3 of the Old Blush genome database [6], and S C2 is the new S-RNase identified in RC0 of the Old Blush genome database [5]. Other S-alleles are S-RNase genes identified in R. multiflora; S 7 , S 9 , S 11 , and S 13 are orthologous to S C1 , and S 6 , S 10 , and S 12 are orthologous to S C2 (Fig. 1c). Sx copia-retrotransposon inserted, non-functional allele) [10], and ksn null (the deletion, non-functional allele) [5] were distinguished by PCR. New primers were designed to amplify the ksn null allele (Fig.S10-3). Old Blush, Hume's Blush Tea-scented China, and Mutabilis are heterozygous for the two non-functional alleles (ksn copia / ksn null ), whereas Slater's Crimson China and R. chinensis are homozygous for one non-functional allele (ksn copia / ksn copia ), and Sanguinea is homozygous for the other non-functional allele (ksn null / ksn null ). Other old Chinese cultivars with once-flowering (OF) behaviour all have the wild KSN allele (KSN W ). The ap2 allele (a transposon-inserted allele) of AP2-like [5,11] exists in all old Chinese cultivars with a double flower phenotype ( Table 5).

Introgression of Chinese S-alleles into European roses
153 rose cultivars with a variety of breeding histories were genotyped to test for the presence of ksn and ap2 and their associated S C (See Supplementary data Table  D6 for the list of cultivars) . The frequencies of rose cultivars with these genes were calculated, along with their breeding periods (Fig. 5). Only 25% of European roses bred before 1850 had either ksn copia or ksn null . In particular, roses in the Gallica, Damask, Centifolia, and Alba groups have neither ksn copia nor ksn null and show no signs of introgression from Chinese roses (Table S10-1). In contrast, some old roses in the Moss group, such as Mousseline (bred in 1855), had ksn alleles, indicating the onset of introgression from Chinese CF roses in this period. These roses also had S C1 or S C2 , demonstrating the parallel introgression of S C into European roses.
The frequency of roses carrying ksn alleles dramatically increased to 86% of roses bred from 1850-1900, gradually increasing to 100% in roses bred from 1980-2020 (Fig. 5). On the other hand, the frequency of roses with S C reached a peak (95%) in roses bred from 1850-1900, slightly decreasing to 83% in roses bred from 1900-1940, but not changing much until recently (Fig. 5). Due to the small number of samples, these changes in the frequency of rose cultivars after the period II are not statistically significant. The frequency of rose cultivars with ksn in the period II (86%) is not significantly different from that in the period V (100%) (Fisher's exact test, p = 0.0507). The frequency of rose cultivars with Sc in the period II (95%) is not significantly different from that in the period V (82%) (Fisher's exact test, p = 0.2316).
The S C1 and S C2 are major S C alleles, and the frequency of roses with the S C3 increased during the 19 th century (Fig. 5c). In contrast, roses with the S C4 and S C5 were found with low frequency (<10%) in modern cultivars.
The frequency of roses with ap2 showed the same trend of frequency changes with breeding periods as ksn (Fig. 5), suggesting that the ap2 allele also originated from Rosa chinensis "Single white-eye" Chinese roses. Old European roses with the DF phenotype, such as Rosa gallica officinalis, Quatre Saisons, and Chapeau de Napoleon, had no ap2 allele (Table S10-1), indicating that there is another genetic factor for DF phenotype.

The wild origin of S C -alleles
By screening a total of 95 plants from 25 wild Rosa species (Table S11-1) with S C -specific PCRs, putative wild ancestors of S C1-4 were identified (Table 6). Positive PCR amplifications with S C1 -specific primers were observed for two species, R. multiflora and Rosa brunonii (Table S11-2). The sequencing of the PCR products shows that R. multiflora sequences are 100% identical to the S C1 -sequence of Old Blush, while R. brunonii has a 98.4% identity with S C1 (8 SNPs per 500 bp). In the RNA-seq analysis of pistil-expressed genes, it was already found that three R. multiflora plants (Rm1, Rm2, Rm3) have S C1 S-RNase (= Rm3D_S C1 in Fig. 1c), with 100% identical amino acid sequences to Old Blush S C1 (= Rc3D_S C1 ). The cDNA sequence of Rm3D_S C1 is also 100% identical to Rc3D_S C1 (Supplementary data Table D4).
No wild roses were found with positive PCR amplifications from S C2 -specific primers (Supplementary data  Table D6). However, it was found that the genome resequencing individual of R. gigantea (SRR6175515) has 100% identical sequences to S C2 S-RNase of Old Blush (Fig. S11-2). For the S C3 S-RNase isolated from Slater's Crimson China, PCR amplifications with S C3 -specific primers were found for three species, R. chinensis var. spontanea, R. gigantea, and R. multiflora var. cathayensis ( Table 6). Sequencing of the PCR products showed that only R. chinensis var. spontanea has a 100% identical sequence with S C3 of Slater's Crimson China (Table S11-3). For the S C4 S-RNase isolated from R. chinensis, PCR amplifications with S C4 -specific primers were found for three species, R. gigantea, Rosa soulieana, and Rosa helenae ( Table 6). Sequencing of the PCR products showed that only one  Table D6. plant (pink flower type) of R. gigantea has a 100% identical sequence to the S C4 (Table S11-4). For the S C5 S-RNase isolated from Mutabilis, a positive PCR amplification with S C5 -specific primers was found only in a wild individual of R. rubus (Table 6), while the sequencing showed that the PCR product from Rosa rubus is not identical to S C5 of Mutabilis (4 SNPs per 221 bp; Fig. S11-3). The putative genetic connections inferred from the shares of S-alleles are summarized in Figure 6.

Identification and validation of the Rosa S-locus
The Rosa S-locus was identified and confirmed by clarifying some unresolved issues in previous studies. The genome sequencing of R. chinensis "Old Blush" proposed a candidate region of the S-locus [5]. This region corresponds to the region of 3C gene (Fig. 1a). We concluded that the 3C gene is not the true S-RNase because of (i) the low or undetectable expressions of 3C gene in the  (Table 5), indicated by different colored boxes. pistil (Fig. 1b) and (ii) the low level of sequence divergence between alleles and among individuals (Fig. 1c). Vieira et al. [15] analyzed another genome of Old Blush [6] and firstly reported that the 3D gene is the Rosa S-RNase. The S-RNase "Rchinensis1_3-Rchinensis2_27" in Vieira et al. [15] corresponds to the 3D gene (= S C1 S-RNase) in this study. Vieira et al. [15] validated the S-RNase based on the results of no fruit set of a few individuals pollinated by other individuals with the same S-genotypes. This cannot exclude the possibility that inbreeding depression results in no fruit set. We provide strong evidence that the 3D S-RNase is the pistil determinant of the SI in roses by genotyping more than 400 seeds produced by a number of half-compatible pairs of roses (Table 3).
This study also identified the 0A gene (= S C2 S-RNase) in the contig RC0 of the Old Blush genome database (Fig. 1c) and suggested via the mapping approach (Fig. 2,3) that it is an unidentified allele of the S C1 S-RNase of Old Blush. The sequence analysis of the Rosa S-locus suggested that the contig RC0 can be integrated into a poly-N region of the S-locus of RC3 (Fig. 4). The perfect co-segregation of SLF5 in RC3 and of S C2 S-RNase in RC0 in a mapping population (n = 97; See Supplementary information 8) supports this hypothesis. Furthermore, we identified S C2 S-RNase-like genes from the pistil transcriptomes of wild individuals of R. multiflora and R. rugosa (Rm0A and Rg0A genes; Table 1). The pollination experiments using half-compatible pairs of roses that share S C2 or S C2 S-RNase-like genes indicated that these genes are the pistil-determinants of SI in the rose (Table 3). To determine the precise location of S C2 S-RNase in the S-locus of Old Blush, a BAC library will need to be constructed and sequenced, as shown by Liang et al. [42] in their analyses of the S-locus structure of Citrus.
Based on the identification of SLFs flanking to the S-RNase, we estimated that the S-locus of Old Blush spans approximately 500 kbp (Fig. 4). Chen et al. [16] reported that the S-locus of R. rugosa spanned 667kbp, including one S-RNase and 19 F-box genes (Table S7-1b). In order to confirm the physical size of the Rosa S-locus, further co-segregation analyses of SLFs and S-RNase are required. The evolutionary divergence analysis of SLFs in comparison with S-RNase ( Fig.S7-3) suggest that the SI of Rosa is controlled by the non-self recognition system as previously reported by Vieira et al. [15]. In the non-self recognition system, polyploidization will break down the SI [12,13]. In support to the prediction, we found that the colchicine-induced chromosome doubling of a diploid rose resulted in self-compatible tetraploid (Table S6-1).

Insights into the breeding of CF roses
The estimations of the degree of genetic linkage between the S-locus and the ksn controlling CF and the ap2 controlling DF indicate weak but significant genetic linkages between them. Since the CF is a recessive characteristic, the CF rose (ksn-homozygote) cannot be created if ksn remains to link with a same S-allele. The 20% of recombination frequency (Table 4) suggests that when we cross roses carrying the ksn linked with a same Sallele, only 20% seedlings are expected to be CF. Therefore, information of S-alleles linked with ksn is helpful to make the CF rose breeding effective. The SI constraint on breeding would be lower in DF than in CF. Since DF is a dominant characteristic, one ap2-allele is enough to make DF roses [5,11]. Furthermore, we suggest that there had been already other genes or alleles for DF in European roses before the introgression of ap2 from China (Table S10-1).
Further studies are necessary to confirm and assess the degrees of genetic linkages between the S-locus and the loci controlling important ornamental traits, such as CF, DF, thornlessness, and resistance against black spot disease. Our estimation of the degree of genetic linkage depends on a few hybrid populations (Table 4), which cannot represent diverse rose cultivars used for breeding. In addition, the low recombination rates associated with regions adjacent to breakpoints in inversion heterozygotes [43] might result in a stronger genetic linkage of the S-locus with ksn null than with ksn copia .
As we expected, S C alleles were introduced into European roses in the 18 th century with the CF allele ksn. However, the frequency of rose cultivars having S C remains high (> 80%) in modern cultivars, although it tended to decrease from 95% of cultivars from1850-1900 (Fig. 5a). Due to the fact that many rose cultivars possess the same Sc alleles, S-genotyping in advance of breeding is helpful for making the diploid rose breeding effective. Furthermore, due to the fact that the two Sc alleles S C4 and S C5 are not common (< 10%) in modern cultivars (Fig. 5c), original Chinese CF cultivars of S C4 (R. chinensis) and S C5 (Mutabilis) are still useful breeding materials to introduce these rare S-alleles into CF cultivars.
As the SI can break down with polyploidization (Table S6-1), breeding between polyploid cultivars may not necessarily consider SI constraints. However, substantial portion (22-35%) of modern cultivars are estimated to be diploid [44,45], and most wild species and Chinese old cultivars are diploid. Therefore, diploid rose breeding is still an important part of rose breeding.

Insights into the wild origin of CF roses
We identified the putative ancestors of the S C -alleles of old Chinese cultivars by screening wild roses with the S C -specific primers ( Table 6). The results confirmed the hybrid origin of the old Chinese cultivars with R. chinensis var. spontanea and R. gigantea in the section Chinenses (Indicae) and the introgression from R. multiflora in the section Synstylae (Fig. 6). The genome sequencing of Old Blush reported a sign of introgression from the section Synstylae [6], but which species of the section is involved in the formation of the Old Blush genome is not clarified. Yang et al. [30] discussed that R. multiflora is a candidate species contributing to the introgression, but there are many other candidates in the section. We investigated nine candidate species in the section Synstylae in southwestern China (Table 6) and demonstrated that only R. multiflora has an identical S C1 -allele with Old Blush, providing a new evidence for the genetic link between R. multiflora and old Chinese cultivars.

Genome-wide identification of candidate S-RNase
By using S-RNase of other Rosaceae crops, including Malus domestica (AAA79841.1), Prunus dulcis (AAL35960.2), Prunus avium (BAA36389.1, CAC27788.1), and Prunus persica (BAF42768.1) as queries, the Old Blush genome databases [5,6] were searched using TBLASTN to identify candidate S-RNase. All genomic regions with significant hits (E-value <10 -10 ), including the regions without any annotations, were listed and manually annotated to infer the coding regions. The other genome databases of Rosa multiflora [7] and R. rugosa [16,39] were then searched using the candidate S-RNase genes of Old Blush as queries to identify their orthologous genes. For the phylogenetic reconstruction of S-RNase-like genes in roses, deduced amino acid sequences were aligned using MUSCLE and then converted back to DNA sequences. The maximum-likelihood phylogenetic tree was constructed from the nucleotide protein-coding sequence alignment by FastTree [37,38] using the Jukes-Cantor model of nucleotide evolution.

RNA-seq analysis
To confirm the expression of candidate S-RNase in the pistil and to isolate new S-RNase alleles, RNA sequencing (RNA-seq) was used. Flower buds one or a few days before anthesis were collected from three individuals of Old Blush, eight individuals of R. multiflora, and one individual of R. rugosa. Pistils and stamens were collected from the buds and immediately frozen with liquid nitrogen. Total RNAs were extracted using a commercial kit according to the protocol described in Dubois et al. [46], and RNAseq data was obtained through poly-A purification and the 150 bp paired-end method. The RNA-seq reads were mapped to the CDSs of specific genes and whole genome data to calculate FPKM (Fragments per kilobase of exon per million reads mapped) values. To analyze the data of wild R. multiflora and R. rugosa plants without any specific reference genome databases, the RNA-seq reads were assembled first, followed by the construction of a local database of mRNA sequences, and the database was blasted using the Old Blush genes as queries to identify orthologous genes. The assembly of RNA-seq data was conducted by using the velvet and tadpole algorithm, with default parameters in Geneious Prime 2020.

Genetic mapping of the new S-RNase in chromosome zero
Genetic mapping of a newly-identified S-RNase in the contig that is not assigned to seven chromosomes was performed by using an F 1 diploid mapping population (FW) [19,47,48]. Three new genetic markers linked to the candidate S-RNase were added to the previous map to estimate the genomic position of the S-gene (Supplementary information 4).

SLF identification
ORFs longer than 1kbp were extracted from the 1Mbp genomic regions surrounding the S-RNase in the Old Blush, R. multiflora, and R. rugosa genomes, and the F-box genes were identified by a Blast search of the ORFs.

Validation of the S-RNase based S-genotyping by pollination experiments
A total of 20 pairs of diploid roses that share one S-allele (i.e. half-compatible) were used for pollination experiments to validate the candidate S-RNase gene. Old Blush (S C1 / S C2 ), The Fairy (S C2 / S 21 ), R. chinensis "Single whiteeye" (S C1 / S 12 ), and 11 wild individuals of R. multiflora were selected based on their S-genotypes. Before anthesis, flower buds were bagged to prevent open pollination. Petals and anthers were removed at the balloon stage, and outcross pollen grains were put on the exposed stigma. Pollinations were carried out from April to May in 2018 and 2020, and matured fruits were collected from September to October of the same years. The fruits were opened in the laboratory, the achenes (seeds) were collected, and the S-genotypes of the seeds were determined by PCR (Supplementary information 5).

Linkage between the S-locus and important ornamental traits
Two F 1 -diploid mapping populations (FW [19] and 94 / 1 [41]) were used to estimate the recombination frequencies between the S-locus and the genes controlling CF (KSN) and DF (AP2-like). The genotyping of the S-locus, KSN, and AP2-like was performed by PCR (Supplementary information 9), and recombination frequencies were calculated.

Introgression of the Chinese S-alleles into modern roses
A total of 153 rose cultivars were selected from a wide range of breeding ages and classifications, and young leaves were collected from 20 rose gardens and nurseries in Japan from 2014-2020 (Supplementary data Table D6). DNA was extracted from the young leaves by using the Nucleospin Plant II kit (Macherey-Nagel) according to the manufacturer's protocol. A PCR was performed to test whether the roses have specific S-alleles, mutatedalleles ksn, wild-allele KSN W , or the mutated-allele ap2 by using EmeraldAmp PCR Master Mix (TaKaRa) with thermal cycling: (1) Table D1. To identify other Chinese S-alleles originally linked with ksn, we used three old Chinese CF cultivars, Slater's Crimson China, Mutabilis, and Rosa chinensis, extracted RNAs from their pistils, and performed RT-PCR with primer sets designed on conserved sites of S-RNase. Partial or full sequences of three new S-alleles were determined and named as S C3 , S C4 , and S C5 , and specific primers were designed for each (Supplementary information 10).

Wild roses carrying the same S-alleles as old cultivars in China
By screening wild rose species carrying the same S-alleles as the old Chinese CF cultivars, the wild ancestral species were inferred. A total of 25 Rosa species were surveyed (Table S11-